AITopics

2510.24145

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Diagnostic Medicine (0.67)
Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Islam, Mohammad Saiful, Rakha, Mohamed Sami, Pourmajidi, William, Sivaloganathan, Janakan, Steinbacher, John, Miranskyy, Andriy

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

arXiv.org Artificial IntelligenceJan-6-2025

As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

anomaly, data mining, machine learning, (18 more...)

2411.09047

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-11-2024

Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

Kuang, Jinxi, Liu, Jinyang, Huang, Junjie, Zhong, Renyi, Gu, Jiazhen, Yu, Lan, Tan, Rui, Yang, Zengyin, Lyu, Michael R.

Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typically utilize semantic similarity-based methods or statistical methods to aggregate alerts. However, semantic similarity-based methods overlook the causal rationale of alerts, while statistical methods can hardly handle infrequent alerts. To tackle these limitations, we introduce leveraging external knowledge, i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose COLA, a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. The correlation mining module effectively captures the temporal and spatial relations between alerts, measuring their correlations in an efficient manner. Subsequently, only uncertain pairs with low confidence are forwarded to the LLM reasoning module for detailed analysis. This hybrid design harnesses both statistical evidence for frequent alerts and the reasoning capabilities of computationally intensive LLMs, ensuring the overall efficiency of COLA in handling large volumes of alerts in practical scenarios. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. The experimental results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency. We also share our experience in deploying COLA in our real-world cloud system, Cloud X.

alert, knowledge, llm, (14 more...)

doi: 10.1145/3639477.3639745

2403.06485

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceJan-7-2024

Why does Prediction Accuracy Decrease over Time? Uncertain Positive Learning for Cloud Failure Prediction

Li, Haozhe, Ma, Minghua, Liu, Yudong, Zhao, Pu, Zheng, Lingling, Li, Ze, Dang, Yingnong, Chintalapati, Murali, Rajmohan, Saravan, Lin, Qingwei, Zhang, Dongmei

With the rapid growth of cloud computing, a variety of software services have been deployed in the cloud. To ensure the reliability of cloud services, prior studies focus on failure instance (disk, node, and switch, etc.) prediction. Once the output of prediction is positive, mitigation actions are taken to rapidly resolve the underlying failure. According to our real-world practice in Microsoft Azure, we find that the prediction accuracy may decrease by about 9% after retraining the models. Considering that the mitigation actions may result in uncertain positive instances since they cannot be verified after mitigation, which may introduce more noise while updating the prediction model. To the best of our knowledge, we are the first to identify this Uncertain Positive Learning (UPLearning) issue in the real-world cloud failure prediction scenario. To tackle this problem, we design an Uncertain Positive Learning Risk Estimator (Uptake) approach. Using two real-world datasets of disk failure prediction and conducting node prediction experiments in Microsoft Azure, which is a top-tier cloud provider that serves millions of users, we demonstrate Uptake can significantly improve the failure prediction accuracy by 5% on average.

failure prediction, prediction, uptake, (13 more...)

2402.00034

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceSep-30-2023

Log-based Anomaly Detection based on EVT Theory with feedback

Liu, Jinyang, Huang, Junjie, Huo, Yintong, Jiang, Zhihan, Gu, Jiazhen, Chen, Zhuangbin, Feng, Cong, Yan, Minzhi, Lyu, Michael R.

System logs play a critical role in maintaining the reliability of software systems. Fruitful studies have explored automatic log-based anomaly detection and achieved notable accuracy on benchmark datasets. However, when applied to large-scale cloud systems, these solutions face limitations due to high resource consumption and lack of adaptability to evolving logs. In this paper, we present an accurate, lightweight, and adaptive log-based anomaly detection framework, referred to as SeaLog. Our method introduces a Trie-based Detection Agent (TDA) that employs a lightweight, dynamically-growing trie structure for real-time anomaly detection. To enhance TDA's accuracy in response to evolving log data, we enable it to receive feedback from experts. Interestingly, our findings suggest that contemporary large language models, such as ChatGPT, can provide feedback with a level of consistency comparable to human experts, which can potentially reduce manual verification efforts. We extensively evaluate SeaLog on two public datasets and an industrial dataset. The results show that SeaLog outperforms all baseline methods in terms of effectiveness, runs 2X to 10X faster and only consumes 5% to 41% of the memory resource.

anomaly detection, log message, template, (14 more...)

2306.05032

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-2-2023

Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case

Zhou, Baifan, Nikolov, Nikolay, Zheng, Zhuoxun, Luo, Xianghui, Savkovic, Ognjen, Roman, Dumitru, Soylu, Ahmet, Kharlamov, Evgeny

Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and mappings for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results.

artificial intelligence, machine learning, semcloud, (16 more...)

2308.01094

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > Italy (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry:

Information Technology > Services (0.47)
Information Technology > Software (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.92)

#artificialintelligenceMar-7-2023, 07:05:49 GMT

Sechrist Industries, Inc. Introduces New Cloud System

Sechrist Industries, Inc., the pioneer of monoplace hyperbaric chambers and air/oxygen mixers, has introduced their new, proprietary Sechrist Cloud System. Unique to Sechrist Industry customers, the Hyperbaric Information Tracking System has all the key and important information about customers' Sechrist Monoplace Hyperbaric Systems always at their fingertips, available online, 24/7. Company President, Deepak Talati remarked: "Sechrist believes that coming up with solutions to make the workload for clinicians and technicians easier is important so that more time can be spent caring for patients. The Sechrist Cloud Hyperbaric Information Tracking System puts all key chamber information in one easy to access location eliminating the need for binders and paper. Our goal at Sechrist is to make record keeping paperless and always accessible. The Sechrist Cloud System is designed to provide our customers with more time for patient care and less time managing paper."

cloud system, introduce new cloud system, sechrist industry, (8 more...)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

#artificialintelligenceJan-16-2023, 04:15:18 GMT

Construction Industry Top 10 Trends in the Next Decade

AEM presented 10 top trends for the future of building construction, among them alternative power, the electrification of compact equipment, autonomous machinery and sensors for increased safety. Referencing recent aviation fuel regulations plans, the California Air Resources Board's (CARB) ban on small engines on new equipment starting in 2024, the Environmental Protection Agency's (EPA) new greenhouse gas emissions rules for 2023–2026 passenger vehicles and light-duty trucks and the EPA's plan to reduce greenhouse gas emissions from heavy-duty trucks starting with 2027 models, the AEM whitepaper asserts that construction companies will see their fleets change over the next decade, as well. Major corporations continue to invest in renewable energy like biofuels, solar and wind power, as construction companies and large contractors commit to net-zero impact pledges for new buildings and infrastructure. The United States' commitment to cutting carbon emissions by 50% by 2030 will spur "the electrification of many segments of the compact construction equipment market" over the next 10 years, according to AEM. Thanks to the advanced 5G network and cloud systems, equipment tracking will allow real-time visibility into productivity and maintenance on a Jobsite, so operators and contractors can make sure they queue properly and have the most efficient job flow they can.

artificial intelligence, construction company, data fountain wall painting robot, (8 more...)

Country: North America > United States > California (0.26)

Industry:

Law > Environmental Law (1.00)
Construction & Engineering (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
Energy > Renewable > Wind (0.57)

Technology: Information Technology > Artificial Intelligence > Robots (0.57)

#artificialintelligenceNov-12-2022, 20:02:34 GMT

Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems - Microsoft Research

When legendary computer scientist Jim Gray accepted the Turing Award in 1999, he laid out a dozen long-range information technology research goals. One of those goals called for the creation of trouble-free server systems or, in Gray's words, to "build a system used by millions of people each day and yet administered and managed by a single part-time person." Gray envisioned a self-organizing "server in the sky" that would store massive amounts of data, and refresh or download data as needed. Today, with the emergence and rapid advancement of artificial intelligence (AI), machine learning (ML) and cloud computing, and Microsoft's development of Cloud Intelligence/AIOps, we are closer than we have ever been to realizing that vision--and moving beyond it. Over the past fifteen years, the most significant paradigm shift in the computing industry has been the migration to cloud computing, which has created unprecedented digital transformation opportunities and benefits for business, society, and human life.

aiop, cloud system, engineer, (11 more...)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceJun-3-2022, 07:00:06 GMT

Leading edge computing companies of 2022

Edge computing refers to a solution where data processing, analysis and in some cases, actions, occur close to the place where the data originated. Edge computing often relies on a sporadic connection to cloud computing systems, although some setups similarly connect to nearby devices -- in which case the systems might be referred to as part of the Internet of Things (IoT). Edge computing solutions operate in circumstances where current cloud computing systems won't suffice, due to one or more of the following concerns: Wherever you encounter one or more of the above four constraints, you'll also find an example of an edge computing solution. Machines, such as autonomous cars or industrial robots, generate huge quantities of data and act with low latency. Some agricultural systems operate in areas that lack high-bandwidth network connections.

application, computing, twitter, (16 more...)

Country:

North America > United States (0.04)
North America > Aruba (0.04)

Industry:

Information Technology > Services (1.00)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.34)